home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-05-21 | 63.8 KB | 1,736 lines |
- % This program by D. E. Knuth is not copyrighted and can be used freely.
- % Version 0 was implemented in January 1982.
- % In February 1982 a new restriction on ligature steps was added.
- % In June 1982 the routines were divided into smaller pieces for IBM people,
- % and the result was designated "Version 1" in September 1982.
- % Slight changes were made in October, 1982, for version 0.6 of TeX.
- % Version 2 (July 1983) was released with TeX version 0.999.
- % Version 2.1 (September 1983) changed TEXINFO to FONTDIMEN.
- % Version 2.2 (February 1984) simplified decimal fraction output.
- % Version 2.3 (May 1984) fixed a bug when lh=17.
- % Version 2.4 (July 1984) fixed a bug involving unused ligature code.
- % Version 2.5 (September 1985) updated the standard codingscheme names.
- % Version 3 (October 1989) introduced new ligature capabilities.
- % Version 3.1 (November 1989) renamed z[] to lig_z[] for better portability.
-
- % Here is TeX material that gets inserted after \input webmac
- \def\hang{\hangindent 3em\indent\ignorespaces}
- \font\ninerm=cmr9
- \let\mc=\ninerm % medium caps for names like SAIL
- \def\PASCAL{Pascal}
-
- \def\(#1){} % this is used to make section names sort themselves better
- \def\9#1{} % this is used for sort keys in the index
-
- \def\title{TF\lowercase{to}PL}
- \def\contentspagenumber{201}
- \def\topofcontents{\null
- \def\titlepage{F} % include headline on the contents page
- \def\rheader{\mainfont\hfil \contentspagenumber}
- \vfill
- \centerline{\titlefont The {\ttitlefont TFtoPL} processor}
- \vskip 15pt
- \centerline{(Version 3.1, November 1989)}
- \vfill}
- \def\botofcontents{\vfill
- \centerline{\hsize 5in\baselineskip9pt
- \vbox{\ninerm\noindent
- The preparation of this report
- was supported in part by the National Science
- Foundation under grants IST-8201926 and MCS-8300984,
- and by the System Development Foundation. `\TeX' is a
- trademark of the American Mathematical Society.}}}
- \pageno=\contentspagenumber \advance\pageno by 1
-
- @* Introduction.
- The \.{TFtoPL} utility program converts \TeX\ font metric (``\.{TFM}'')
- files into equivalent property-list (``\.{PL}'') files. It also
- makes a thorough check of the given \.{TFM} file, using essentially the
- same algorithm as \TeX. Thus if \TeX\ complains that a \.{TFM}
- file is ``bad,'' this program will pinpoint the source or sources of
- badness. A \.{PL} file output by this program can be edited with
- a normal text editor, and the result can be converted back to \.{TFM}
- format using the companion program \.{PLtoTF}.
-
- The first \.{TFtoPL} program was designed by Leo Guibas in the summer of
- 1978. Contributions by Frank Liang, Doug Wyatt, and Lyle Ramshaw
- also had a significant effect on the evolution of the present code.
-
- Extensions for an enhanced ligature mechanism were added by the author in 1989.
-
- The |banner| string defined here should be changed whenever \.{TFtoPL}
- gets modified.
-
- @d @!banner "This is TFtoPL, Version 3.1" /* printed when the program starts */
- @d @!local_banner "Local Version"
-
- @ This program is written entirely in standard \PASCAL, except that
- it occasionally has lower case letters in strings that are output.
- Such letters can be converted to upper case if necessary. The input is read
- from |tfm_file|, and the output is written on |pl_file|; error messages and
- other remarks are written on the |output| file, which the user may
- choose to assign to the terminal if the system permits it.
- @^system dependencies@>
-
- The term |print| is used instead of |write| when this program writes on
- the |output| file, so that all such output can be easily deflected.
- @d NAME_LENGTH 120
- @d odd(a) ((a)&0x1)
- @c
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- #include "portab.h"
- @#
- FILE * @!output = stdout;
- char tfm_name[NAME_LENGTH], pl_name[NAME_LENGTH], output_name[NAME_LENGTH];
- @#
- @<Types in the outer block@>@/
- @<Globals in the outer block@>@/
- void initialize(void) /* this procedure gets things started properly */
- { if (output_name[0] != '\0') {
- output = fopen(output_name, "w");
- if (output == NULL) {
- fprintf(stderr, "I can't open log file \"%s\"\n", output_name);
- fprintf(stderr, "The output will appear on the terminal\n");
- output = stdout;
- }
- }
- fprintf(output, "%s\n", banner);@/
- fprintf(output, "%s\n", local_banner);@/
- @<Set initial values@>@/
- }
-
- @ If the program has to stop prematurely, it goes to the
- `|final_end|'.
-
- @ The following parameters can be changed at compile time to extend or
- reduce \.{TFtoPL}'s capacity.
- @d @!tfm_size 30000 /* maximum length of |tfm| data, in bytes */
- @d @!lig_size 5000 /* maximum length of |lig_kern| program, in words */
- @d @!hash_size 5003 /* preferably a prime number, a bit larger than the number
- of character pairs in lig/kern steps */
-
- @
- @<Types...@>=
- typedef UWORD @!tfm_size_type;
- typedef UWORD @!lig_size_type;
- typedef UWORD @!hash_size_type;
-
- @ Here are some macros for common programming idioms.
-
- @d incr(a) (a)++ /* increase a variable by unity */
- @d decr(a) (a)-- /* decrease a variable by unity */
- @d do_nothing /* empty statement */
-
- @* Font metric data.
- The idea behind \.{TFM} files is that typesetting routines like \TeX\
- need a compact way to store the relevant information about several
- dozen fonts, and computer centers need a compact way to store the
- relevant information about several hundred fonts. \.{TFM} files are
- compact, and most of the information they contain is highly relevant,
- so they provide a solution to the problem.
-
- The information in a \.{TFM} file appears in a sequence of 8-bit bytes.
- Since the number of bytes is always a multiple of 4, we could
- also regard the file as a sequence of 32-bit words; but \TeX\ uses the
- byte interpretation, and so does \.{TFtoPL}. Note that the bytes
- are considered to be unsigned numbers.
-
- @<Glob...@>=
- FILE * @!tfm_file;
-
- @ On some systems you may have to do something special to read a
- packed file of bytes. For example, the following code didn't work
- when it was first tried at Stanford, because packed files have to be
- opened with a special switch setting on the \PASCAL\ that was used.
- @^system dependencies@>
-
- @<Set init...@>=
- tfm_file = fopen(tfm_name, "rb");
- if (tfm_file == NULL) {
- fprintf(stderr, "I can't open TFM file \"%s\"\n", tfm_name);
- exit(1);
- }
-
- @ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit
- integers that give the lengths of the various subsequent portions
- of the file. These twelve integers are, in order:
- $$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr
- |@!lf|&length of the entire file, in words;\cr
- |@!lh|&length of the header data, in words;\cr
- |@!bc|&smallest character code in the font;\cr
- |@!ec|&largest character code in the font;\cr
- |@!nw|&number of words in the width table;\cr
- |@!nh|&number of words in the height table;\cr
- |@!nd|&number of words in the depth table;\cr
- |@!ni|&number of words in the italic correction table;\cr
- |@!nl|&number of words in the ligfdivkern table;\cr
- |@!nk|&number of words in the kern table;\cr
- |@!ne|&number of words in the extensible character table;\cr
- |@!np|&number of font parameter words.\cr}}$$
- They are all nonnegative and less than $2^{15}$. We must have |bc-1<=ec<=255|,
- |ne<=256|, and
- $$\hbox{|lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$
- Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|),
- and as few as 0 characters (if |bc=ec+1|).
-
- Incidentally, when two or more 8-bit bytes are combined to form an integer of
- 16 or more bits, the most significant bytes appear first in the file.
- This is called BigEndian order.
-
- @<Glob...@>=
- UWORD @!lf,@!lh,@!bc,@!ec,@!nw,@!nh,@!nd,@!ni,@!nl,@!nk,@!ne,@!np;
- /* subfile sizes */
-
- @ The rest of the \.{TFM} file may be regarded as a sequence of ten data
- arrays having the informal specification
- $$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2}
- \vbox{\halign{\hfil\\{#}&$\,:\,$\arr#\hfil\cr
- header&|[0..lh-1]stuff|\cr
- char\_info&|[bc..ec]char_info_word|\cr
- width&|[0..nw-1]fix_word|\cr
- height&|[0..nh-1]fix_word|\cr
- depth&|[0..nd-1]fix_word|\cr
- italic&|[0..ni-1]fix_word|\cr
- lig\_kern&|[0..nl-1]lig_kern_command|\cr
- kern&|[0..nk-1]fix_word|\cr
- exten&|[0..ne-1]extensible_recipe|\cr
- param&|[1..np]fix_word|\cr}}$$
- The most important data type used here is a |@!fix_word|, which is
- a 32-bit representation of a binary fraction. A |fix_word| is a signed
- quantity, with the two's complement of the entire word used to represent
- negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the
- binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and
- the smallest is $-2048$. We will see below, however, that all but one of
- the |fix_word| values will lie between $-16$ and $+16$.
-
- @ The first data array is a block of header information, which contains
- general facts about the font. The header must contain at least two words,
- and for \.{TFM} files to be used with Xerox printing software it must
- contain at least 18 words, allocated as described below. When different
- kinds of devices need to be interfaced, it may be necessary to add further
- words to the header block.
-
- \yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into the
- \.{DVI} output file whenever it uses the font. Later on when the \.{DVI}
- file is printed, possibly on another computer, the actual font that gets
- used is supposed to have a check sum that agrees with the one in the
- \.{TFM} file used by \TeX. In this way, users will be warned about
- potential incompatibilities. (However, if the check sum is zero in either
- the font file or the \.{TFM} file, no check is made.) The actual relation
- between this check sum and the rest of the \.{TFM} file is not important;
- the check sum is simply an identification number with the property that
- incompatible fonts almost always have distinct check sums.
- @^check sum@>
-
- \yskip\hang|header[1]| is a |fix_word| containing the design size of the
- font, in units of \TeX\ points (7227 \TeX\ points = 254 cm). This number
- must be at least 1.0; it is fairly arbitrary, but usually the design size
- is 10.0 for a ``10 point'' font, i.e., a font that was designed to look
- best at a 10-point size, whatever that really means. When a \TeX\ user
- asks for a font `\.{at} $\delta$ \.{pt}', the effect is to override the
- design size and replace it by $\delta$, and to multiply the $x$ and~$y$
- coordinates of the points in the font image by a factor of $\delta$
- divided by the design size. {\sl All other dimensions in the\fdiv\ \.{TFM}
- file are |fix_word|\kern-1pt\ numbers in design-size units.} Thus, for example,
- the value of |param[6]|, one \.{em} or \.{\\quad}, is often the |fix_word|
- value $2^{20}=1.0$, since many fonts have a design size equal to one em.
- The other dimensions must be less than 16 design-size units in absolute
- value; thus, |header[1]| and |param[1]| are the only |fix_word| entries in
- the whole \.{TFM} file whose first byte might be something besides 0 or
- 255. @^design size@>
-
- \yskip\hang|header[2..11]|, if present, contains 40 bytes that identify
- the character coding scheme. The first byte, which must be between 0 and
- 39, is the number of subsequent ASCII bytes actually relevant in this
- string, which is intended to specify what character-code-to-symbol
- convention is present in the font. Examples are \.{ASCII} for standard
- ASCII, \.{TeX text} for fonts like \.{cmr10} and \.{cmti9}, \.{TeX math
- extension} for \.{cmex10}, \.{XEROX text} for Xerox fonts, \.{GRAPHIC} for
- special-purpose non-alphabetic fonts, \.{UNSPECIFIED} for the default case
- when there is no information. Parentheses should not appear in this name.
- (Such a string is said to be in {\mc BCPL} format.)
- @^coding scheme@>
-
- \yskip\hang|header[12..16]|, if present, contains 20 bytes that name the
- font family (e.g., \.{CMR} or \.{HELVETICA}), in {\mc BCPL} format.
- This field is also known as the ``font identifier.''
- @^family name@>
- @^font identifier@>
-
- \yskip\hang|header[17]|, if present, contains a first byte called the
- |seven_bit_safe_flag|, then two bytes that are ignored, and a fourth byte
- called the |face|. If the value of the fourth byte is less than 18, it has
- the following interpretation as a ``weight, slope, and expansion'': Add 0
- or 2 or 4 (for medium or bold or light) to 0 or 1 (for roman or italic) to
- 0 or 6 or 12 (for regular or condensed or extended). For example, 13 is
- 0+1+12, so it represents medium italic extended. A three-letter code
- (e.g., \.{MIE}) can be used for such |face| data.
-
- \yskip\hang|header[18..@twhatever@>]| might also be present; the individual
- words are simply called |header[18]|, |header[19]|, etc., at the moment.
-
- @ Next comes the |char_info| array, which contains one |char_info_word|
- per character. Each |char_info_word| contains six fields packed into
- four bytes as follows.
-
- \yskip\hang first byte: |width_index| (8 bits)\par
- \hang second byte: |height_index| (4 bits) times 16, plus |depth_index|
- (4~bits)\par
- \hang third byte: |italic_index| (6 bits) times 4, plus |tag|
- (2~bits)\par
- \hang fourth byte: |remainder| (8 bits)\par
- \yskip\noindent
- The actual width of a character is |width[width_index]|, in design-size
- units; this is a device for compressing information, since many characters
- have the same width. Since it is quite common for many characters
- to have the same height, depth, or italic correction, the \.{TFM} format
- imposes a limit of 16 different heights, 16 different depths, and
- 64 different italic corrections.
-
- Incidentally, the relation |width[0]=height[0]=depth[0]=italic[0]=0|
- should always hold, so that an index of zero implies a value of zero.
- The |width_index| should never be zero unless the character does
- not exist in the font, since a character is valid if and only if it lies
- between |bc| and |ec| and has a nonzero |width_index|.
-
- @ The |tag| field in a |char_info_word| has four values that explain how to
- interpret the |remainder| field.
-
- \yskip\hang|tag=0| (|no_tag|) means that |remainder| is unused.\par
- \hang|tag=1| (|lig_tag|) means that this character has a ligaturefdivkerning
- program starting at |lig_kern[remainder]|.\par
- \hang|tag=2| (|list_tag|) means that this character is part of a chain of
- characters of ascending sizes, and not the largest in the chain. The
- |remainder| field gives the character code of the next larger character.\par
- \hang|tag=3| (|ext_tag|) means that this character code represents an
- extensible character, i.e., a character that is built up of smaller pieces
- so that it can be made arbitrarily large. The pieces are specified in
- |exten[remainder]|.\par
-
- @d no_tag 0 /* vanilla character */
- @d lig_tag 1 /* character has a ligature/kerning program */
- @d list_tag 2 /* character has a successor in a charlist */
- @d ext_tag 3 /* character is extensible */
-
- @ The |lig_kern| array contains instructions in a simple programming language
- that explains what to do for special letter pairs. Each word is a
- |lig_kern_command| of four bytes.
-
- \yskip\hang first byte: |skip_byte|, indicates that this is the final program
- step if the byte is 128 or more, otherwise the next step is obtained by
- skipping this number of intervening steps.\par
- \hang second byte: |next_char|, ``if |next_char| follows the current character,
- then perform the operation and stop, otherwise continue.''\par
- \hang third byte: |op_byte|, indicates a ligature step if less than~128,
- a kern step otherwise.\par
- \hang fourth byte: |remainder|.\par
- \yskip\noindent
- In a kern step, an
- additional space equal to |kern[256*(op_byte-128)+remainder]| is inserted
- between the current character and |next_char|. This amount is
- often negative, so that the characters are brought closer together
- by kerning; but it might be positive.
-
- There are eight kinds of ligature steps, having |op_byte| codes $4a+2b+c$ where
- $0\le a\le b+c$ and $0\le b,c\le1$. The character whose code is
- |remainder| is inserted between the current character and |next_char|;
- then the current character is deleted if $b=0$, and |next_char| is
- deleted if $c=0$; then we pass over $a$~characters to reach the next
- current character (which may have a ligaturefdivkerning program of its own).
-
- Notice that if $a=0$ and $b=1$, the current character is unchanged; if
- $a=b$ and $c=1$, the current character is changed but the next character is
- unchanged. \.{TFtoPL} will check to see that infinite loops are avoided.
-
- If the very first instruction of the |lig_kern| array has |skip_byte=255|,
- the |next_char| byte is the so-called right boundary character of this font;
- the value of |next_char| need not lie between |bc| and~|ec|.
- If the very last instruction of the |lig_kern| array has |skip_byte=255|,
- there is a special ligaturefdivkerning program for a left boundary character,
- beginning at location |256*op_byte+remainder|.
- The interpretation is that \TeX\ puts implicit boundary characters
- before and after each consecutive string of characters from the same font.
- These implicit characters do not appear in the output, but they can affect
- ligatures and kerning.
-
- If the very first instruction of a character's |lig_kern| program has
- |skip_byte>128|, the program actually begins in location
- |256*op_byte+remainder|. This feature allows access to large |lig_kern|
- arrays, because the first instruction must otherwise
- appear in a location |<=255|.
-
- Any instruction with |skip_byte>128| in the |lig_kern| array must have
- |256*op_byte+remainder<nl|. If such an instruction is encountered during
- normal program execution, it denotes an unconditional halt; no ligature
- command is performed.
-
- @d stop_flag 128 /* value indicating `\.{STOP}' in a lig/kern program */
- @d kern_flag 128 /* op code for a kern step */
-
- @ Extensible characters are specified by an |extensible_recipe|,
- which consists of four bytes called |top|, |mid|,
- |bot|, and |rep| (in this order). These bytes are the character codes
- of individual pieces used to build up a large symbol.
- If |top|, |mid|, or |bot| are zero,
- they are not present in the built-up result. For example, an extensible
- vertical line is like an extensible bracket, except that the top and
- bottom pieces are missing.
-
-
- @ The final portion of a \.{TFM} file is the |param| array, which is another
- sequence of |fix_word| values.
-
- \yskip\hang|param[1]=@!slant| is the amount of italic slant, which is used
- to help position accents. For example, |slant=.25| means that when you go
- up one unit, you also go .25 units to the right. The |slant| is a pure
- number; it's the only |fix_word| other than the design size itself that is
- not scaled by the design size.
-
- \hang|param[2]=space| is the normal spacing between words in text.
- Note that character |" "| in the font need not have anything to do with
- blank spaces.
-
- \hang|param[3]=space_stretch| is the amount of glue stretching between words.
-
- \hang|param[4]=space_shrink| is the amount of glue shrinking between words.
-
- \hang|param[5]=x_height| is the height of letters for which accents don't
- have to be raised or lowered.
-
- \hang|param[6]=quad| is the size of one em in the font.
-
- \hang|param[7]=extra_space| is the amount added to |param[2]| at the
- ends of sentences.
-
- When the character coding scheme is \.{TeX math symbols}, the font is
- supposed to have 15 additional parameters called |num1|, |num2|, |num3|,
- |denom1|, |denom2|, |sup1|, |sup2|, |sup3|, |sub1|, |sub2|, |supdrop|,
- |subdrop|, |delim1|, |delim2|, and |axis_height|, respectively. When the
- character coding scheme is \.{TeX math extension}, the font is supposed to
- have six additional parameters called |default_rule_thickness| and
- |big_op_spacing1| through |big_op_spacing5|.
-
- @ So that is what \.{TFM} files hold. The next question is, ``What about
- \.{PL} files?'' A complete answer to that question appears in the
- documentation of the companion program, \.{PLtoTF}, so it will not
- be repeated here. Suffice it to say that a \.{PL} file is an ordinary
- \PASCAL\ text file, and that the output of \.{TFtoPL} uses only a
- subset of the possible constructions that might appear in a \.{PL} file.
- Furthermore, hardly anybody really wants to look at the formal
- definition of \.{PL} format, because it is almost self-explanatory when
- you see an example or two.
-
- @<Glob...@>=
- FILE * @!pl_file;
-
- @ @<Set init...@>=
- pl_file = fopen(pl_name, "w");
- if (pl_file == NULL) {
- fprintf(stderr, "I can't open the PL file \"%s\"\n", pl_name);
- exit(1);
- }
-
- @* Unpacked representation.
- The first thing \.{TFtoPL} does is read the entire |tfm_file| into an array of
- bytes, |tfm[0..(4*lf-1)]|.
-
- @<Types...@>=
- typedef UBYTE @!byte; /* unsigned eight-bit quantity */
- typedef tfm_size_type @!index; /* address of a byte in |tfm| */
-
- @
- @d tfm(a) internal_tfm[a+1000]
-
- @ @<Glob...@>=
- byte @!internal_tfm[tfm_size+1002]; /* the input data all goes here */
- /* the negative addresses avoid range checks for invalid characters */
-
- @ The input may, of course, be all screwed up and not a \.{TFM} file
- at all. So we begin cautiously.
-
- @d pabort(a) { fprintf(output, "%s\n", a);
- fprintf(output, "%s\n", "Sorry, but I can't go on; are you sure this is a TFM?");
- goto final_end;
- }
- @d pabortp(a, b) { fprintf(output, "%s\n", a, b);
- fprintf(output, "%s\n", "Sorry, but I can't go on; are you sure this is a TFM?");
- goto final_end;
- }
- @d pabortpp(a, b, c) { fprintf(output, "%s\n", a, b, c);
- fprintf(output, "%s\n", "Sorry, but I can't go on; are you sure this is a TFM?");
- goto final_end;
- }
-
- @<Read the whole input file@>=
- fread(&tfm(0), sizeof(tfm(0)), 1, tfm_file);
- if (tfm(0) > 127) pabort("The first byte of the input file exceeds 127!");
- @.The first byte...@>
- if (feof(tfm_file)) pabort("The input file is only one byte long!");
- @.The input...one byte long@>
- fread(&tfm(1), sizeof(tfm(1)), 1, tfm_file); lf = tfm(0)*0400+tfm(1);
- if (lf == 0)
- pabort("The file claims to have length zero, but that's impossible!");
- @.The file claims...@>
- if (4*lf-1 > tfm_size) pabort("The file is bigger than I can handle!");
- @.The file is bigger...@>
- for (tfm_ptr=2; tfm_ptr<4*lf; tfm_ptr++)
- { if (feof(tfm_file))
- pabort("The file has fewer bytes than it claims!");
- @.The file has fewer bytes...@>
- fread(&tfm(tfm_ptr), sizeof(tfm(tfm_ptr)), 1, tfm_file);
- }
- getc(tfm_file);
- if (!feof(tfm_file))
- { fprintf(output, "%s\n", "There's some extra junk at the end of the TFM file,");
- @.There's some extra junk...@>
- fprintf(output, "%s\n", "but I'll proceed as if it weren't there.");
- }
-
- @ After the file has been read successfully, we look at the subfile sizes
- to see if they check out.
-
- @d eval_two_bytes(a) { if (tfm(tfm_ptr) > 127)
- pabort("One of the subfile sizes is negative!");
- @.One of the subfile sizes...@>
- a = tfm(tfm_ptr)*0400+tfm(tfm_ptr+1);
- tfm_ptr += 2;
- }
-
- @<Set subfile sizes |lh|, |bc|, \dots, |np|@>=
- { tfm_ptr = 2;@/
- eval_two_bytes(lh);
- eval_two_bytes(bc);
- eval_two_bytes(ec);
- eval_two_bytes(nw);
- eval_two_bytes(nh);
- eval_two_bytes(nd);
- eval_two_bytes(ni);
- eval_two_bytes(nl);
- eval_two_bytes(nk);
- eval_two_bytes(ne);
- eval_two_bytes(np);
- if (lh < 2) pabortp("The header length is only %ld!", (long)lh);
- @.The header length...@>
- if (nl > 4*lig_size)
- pabort("The lig/kern program is longer than I can handle!");
- @.The lig/kern program...@>
- if ((bc > ec+1) || (ec > 255)) pabortpp("The character code range %ld..%ld is illegal!", (long)bc, (long)ec);
- @.The character code range...@>
- if ((nw == 0) || (nh == 0) || (nd == 0) || (ni == 0))
- pabort("Incomplete subfiles for character dimensions!");
- @.Incomplete subfiles...@>
- if (ne > 256) pabortp("There are %ld extensible recipes!", (long)ne);
- @.There are ... recipes@>
- if (lf != 6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np)
- pabort("Subfile sizes don't add up to the stated total!");
- @.Subfile sizes don't add up...@>
- }
-
- @ Once the input data successfully passes these basic checks,
- \.{TFtoPL} believes that it is a \.{TFM} file, and the conversion
- to \.{PL} format will take place. Access to the various subfiles
- is facilitated by computing the following base addresses. For example,
- the |char_info| for character |c| will start in location
- |4*(char_base+c)| of the |tfm| array.
-
- @<Globals...@>=
- LONG @!char_base,@!width_base,@!height_base,@!depth_base,@!italic_base,
- @!lig_kern_base,@!kern_base,@!exten_base,@!param_base;
- /* base addresses for the subfiles */
-
- @ @<Compute the base addresses@>=
- { char_base = 6+lh-bc;
- width_base = char_base+ec+1;
- height_base = width_base+nw;
- depth_base = height_base+nh;
- italic_base = depth_base+nd;
- lig_kern_base = italic_base+ni;
- kern_base = lig_kern_base+nl;
- exten_base = kern_base+nk;
- param_base = exten_base+ne-1;
- }
-
- @ Of course we want to define macros that suppress the detail of how the
- font information is actually encoded. Each word will be referred to by
- the |tfm| index of its first byte. For example, if |c| is a character
- code between |bc| and |ec|, then |tfm[char_info(c)]| will be the
- first byte of its |char_info|, i.e., the |width_index|; furthermore
- |width(c)| will point to the |fix_word| for |c|'s width.
-
- @d check_sum 24
- @d design_size check_sum+4
- @d scheme design_size+4
- @d family scheme+40
- @d random_word family+20
- @d char_info(a) 4*(char_base+a)
- @d width_index(a) tfm(char_info(a))
- @d nonexistent(a) ((a < bc) || (a > ec) || (width_index(a) == 0))
- @d height_index(a) (tfm(char_info(a)+1) / 16)
- @d depth_index(a) (tfm(char_info(a)+1) % 16)
- @d italic_index(a) (tfm(char_info(a)+2) / 4)
- @d tag(a) (tfm(char_info(a)+2) % 4)
- @d reset_tag(a) tfm(char_info(a)+2) = 4*italic_index(a)+no_tag
- @d remainder(a) tfm(char_info(a)+3)
- @d width(a) 4*(width_base+width_index(a))
- @d height(a) 4*(height_base+height_index(a))
- @d depth(a) 4*(depth_base+depth_index(a))
- @d italic(a) 4*(italic_base+italic_index(a))
- @d exten(a) 4*(exten_base+remainder(a))
- @d lig_step(a) 4*(lig_kern_base+(a))
- @d kern(a) 4*(kern_base+a) /* here \#\ is an index, not a character */
- @d param(a) 4*(param_base+a) /* likewise */
-
- @ One of the things we would like to do is take cognizance of fonts whose
- character coding scheme is \.{TeX math symbols} or \.{TeX math extension};
- we will set the |font_type| variable to one of the three choices
- |vanilla|, |mathsy|, or |mathex|.
-
- @d vanilla 0 /* not a special scheme */
- @d mathsy 1 /* \.{TeX math symbols} scheme */
- @d mathex 2 /* \.{TeX math extension} scheme */
-
- @<Glob...@>=
- UBYTE @!font_type; /* is this font special? */
-
- @* Basic output subroutines.
- Let us now define some procedures that will reduce the rest of \.{TFtoPL}'s
- work to a triviality.
-
- First of all, it is convenient to have an abbreviation for output to the
- \.{PL} file:
-
- @d out(a) fprintf(pl_file, a)
- @d outc(a) putc(a, pl_file)
- @d outl(a) fprintf(pl_file, "%ld", (long)(a))
-
- @ In order to stick to standard \PASCAL, we use three strings called
- |ASCII_04|, |ASCII_10|, and |ASCII_14|, in terms of which we can do the
- appropriate conversion of ASCII codes. Three other little strings are
- used to produce |face| codes like \.{MIE}.
-
- @<Glob...@>=
- char @!ASCII_04[33],
- @!ASCII_10[33],
- @!ASCII_14[33];
- /* strings for output in the user's external character set */
- char @!MBL_string[4],
- @!RI_string[4],
- @!RCE_string[4];
- /* handy string constants for |face| codes */
-
- @ @<Set init...@>=
- strcpy(ASCII_04+1, " !\"#$%&'()*+,-./0123456789:;<=>?");@/
- strcpy(ASCII_10+1, "âħABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_");@/
- strcpy(ASCII_14+1, "`abcdefghijklmnopqrstuvwxyz{|}~ ");@/
- strcpy(MBL_string+1, "MBL"); strcpy(RI_string+1, "RI ");
- strcpy(RCE_string+1, "RCE");
-
- @ The array |dig| will hold a sequence of digits to be output.
-
- @<Glob...@>=
- UBYTE @!dig[12];
-
- @ Here, in fact, are two procedures that output |dig[j-1]|$\,\ldots\,$|dig[0]|,
- given $j>0$.
-
- @c void out_digs(LONG j) /* outputs |j| digits */
- { do { decr(j); outc('0'+dig[j]);
- }
- while(j != 0);
- }
- @#
- void print_digs(LONG j) /* prints |j| digits */
- { do { decr(j); putc('0'+dig[j], output);
- }
- while(j != 0);
- }
-
- @ The |print_octal| procedure indicates how |print_digs| can be used.
- Since this procedure is used only to print character codes, it always
- produces three digits.
-
- @c void print_octal(byte c) /* prints octal value of |c| */
- {
- UBYTE j; /* index into |dig| */
- @#
- putc('\'', output); /* an apostrophe indicates the octal notation */
- for (j=0; j<=2; j++)
- { dig[j] = c % 8; c = c / 8;
- }
- print_digs(3);
- }
-
- @ A \.{PL} file has nested parentheses, and we want to format the output
- so that its structure is clear. The |level| variable keeps track of the
- depth of nesting.
-
- @<Glob...@>=
- UWORD @!level;
-
- @ @<Set init...@>=
- level = 0;
-
- @ Three simple procedures suffice to produce the desired structure in the
- output.
-
- @c void out_ln(void) /* finishes one line, indents the next */
- {
- UBYTE l;
- @#
- putc('\n', pl_file);
- for (l=1; l<=level; l++) out(" ");
- }
- @#
- void left(void) /* outputs a left parenthesis */
- { incr(level); outc('(');
- }
- @#
- void right(void) /* outputs a right parenthesis and finishes a line */
- { decr(level); outc(')'); out_ln();
- }
-
- @ The value associated with a property can be output in a variety of
- ways. For example, we might want to output a {\mc BCPL} string that
- begins in |tfm[k]|:
-
- @c void out_BCPL(index @!k) /* outputs a string, preceded by a blank space */
- {
- UBYTE l; /* the number of bytes remaining */
- @#
- outc(' '); l = tfm(k);
- while (l > 0) {
- incr(k); decr(l);
- switch (tfm(k)/040) {
- case 1: outc(ASCII_04[1+(tfm(k)%040)]); break;
- case 2: outc(ASCII_10[1+(tfm(k)%040)]); break;
- case 3: outc(ASCII_14[1+(tfm(k)%040)]); break;
- }
- }
- }
-
- @ The property value might also be a sequence of |l| bytes, beginning
- in |tfm[k]|, that we would like to output in octal notation.
- The following procedure assumes that |l<=4|, but larger values of |l|
- could be handled easily by enlarging the |dig| array and increasing
- the upper bounds on |b| and |j|.
-
- @c void out_octal(index @!k, index @!l) /* outputs |l| bytes in octal */
- {
- UWORD a; /* accumulator for bits not yet output */
- UBYTE @!b; /* the number of significant bits in |a| */
- UBYTE @!j; /* the number of digits of output */
- @#
- out(" O "); /* specify octal format */
- a = 0; b = 0; j = 0;
- while( l > 0 ) @<Reduce \(1)|l| by one, preserving the invariants@>;
- while( (a > 0) || (j == 0) )
- { dig[j] = a % 8; a = a / 8; incr(j);
- }
- out_digs(j);
- }
-
- @ @<Reduce \(1)|l|...@>=
- { decr(l);
- if (tfm(k+l) != 0)
- { while( b > 2 )
- { dig[j] = a % 8; a = a / 8; b = b-3; incr(j);
- }
- switch (b) {
- case 0: a = tfm(k+l); break;
- case 1: a = a+2*tfm(k+l); break;
- case 2: a = a+4*tfm(k+l); break;
- }
- }
- b = b+8;
- }
-
- @ The property value may be a character, which is output in octal
- unless it is a letter or a digit. This procedure is the only place
- where a lowercase letter will be output to the \.{PL} file.
- @^system dependencies@>
-
- @c void out_char(byte @!c) /* outputs a character */
- { if (font_type > vanilla)
- { tfm(0) = c; out_octal(0, 1);
- }
- else if ((c >= '0') && (c <= '9')) {
- out(" C "); outc(c); }
- else if ((c >= 'A') && (c <= 'Z')) {
- out(" C "); outc(ASCII_10[c-'A'+2]); }
- else if ((c >= 'a') && (c <= 'z')) {
- out(" C "); outc(ASCII_14[c-'a'+2]); }
- else { tfm(0) = c; out_octal(0, 1);
- }
- }
-
- @ The property value might be a ``face'' byte, which is output in the
- curious code mentioned earlier, provided that it is less than 18.
-
- @c void out_face(index @!k) /* outputs a |face| */
- {
- UBYTE s; /* the slope */
- UBYTE @!b; /* the weight and expansion */
- @#
- if (tfm(k) >= 18) out_octal(k,1);
- else { out(" F "); /* specify face-code format */
- s = tfm(k) % 2; b = tfm(k) / 2;
- outc(MBL_string[1+(b % 3)]);
- outc(RI_string[1+s]);
- outc(RCE_string[1+(b / 3)]);
- }
- }
-
- @ And finally, the value might be a |fix_word|, which is output in
- decimal notation with just enough decimal places for \.{PLtoTF}
- to recover every bit of the given |fix_word|.
-
- All of the numbers involved in the intermediate calculations of
- this procedure will be nonnegative and less than $10\cdot2^{24}$.
-
- @c void out_fix(index @!k) /* outputs a |fix_word| */
- {
- UWORD a; /* accumulator for the integer part */
- LONG @!f; /* accumulator for the fraction part */
- UBYTE @!j; /* index into |dig| */
- LONG @!delta; /* amount if allowable inaccuracy */
- @#
- out(" R "); /* specify real format */
- a = (tfm(k)*16)+(tfm(k+1) / 16);
- f = ((long)(tfm(k+1) % 16)*0400+tfm(k+2))*0400+tfm(k+3);
- if (a > 03777) @<Reduce \(2)negative to positive@>;
- @<Output the integer part, |a|, in decimal notation@>;
- @<Output the fraction part, $|f|/2^{20}$, in decimal notation@>;
- }
-
- @ The following code outputs at least one digit even if |a=0|.
-
- @<Output the integer...@>=
- { j = 0;
- do { dig[j] = a % 10; a = a / 10; incr(j);
- }
- while(a != 0);
- out_digs(j);
- }
-
- @ And the following code outputs at least one digit to the right
- of the decimal point.
-
- @<Output the fraction...@>=
- { outc('.'); f = 10*f+5; delta = 10;
- do { if (delta > 04000000L) f = f+02000000L-(delta / 2);
- outl(f / 04000000L); f = 10*(f % 04000000L); delta = delta*10;
- }
- while(f > delta);
- }
-
- @ @<Reduce \(2)negative to positive@>=
- { outc('-'); a = 010000-a;
- if (f > 0)
- { f = 04000000L-f; decr(a);
- }
- }
-
- @* Doing it.
- \TeX\ checks the information of a \.{TFM} file for validity as the
- file is being read in, so that no further checks will be needed when
- typesetting is going on. And when it finds something wrong, it justs
- calls the file ``bad,'' without identifying the nature of the problem,
- since \.{TFM} files are supposed to be good almost all of the time.
-
- Of course, a bad file shows up every now and again, and that's where
- \.{TFtoPL} comes in. This program wants to catch at least as many errors as
- \TeX\ does, and to give informative error messages besides.
- All of the errors are corrected, so that the \.{PL} output will
- be correct (unless, of course, the \.{TFM} file was so loused up
- that no attempt is being made to fathom it).
-
- @ Just before each character is processed, its code is printed in octal
- notation. Up to eight such codes appear on a line; so we have a variable
- to keep track of how many are currently there. We also keep track of
- whether or not any errors have had to be corrected.
-
- @<Glob...@>=
- UBYTE @!chars_on_line; /* the number of characters printed on the current line */
- boolean @!perfect; /* was the file free of errors? */
-
- @ @<Set init...@>=
- chars_on_line = 0;@/
- perfect = true; /* innocent until proved guilty */
-
- @ Error messages are given with the help of the |bad| and |range_error|
- and |bad_char| macros:
-
- @d bad(a) { perfect = false; if (chars_on_line > 0) fprintf(output, " \n");
- chars_on_line = 0; fprintf(output, "Bad TFM file: %s", a);
- }
- @d badp(a, b) { perfect = false; if (chars_on_line > 0) fprintf(output, " \n");
- chars_on_line = 0; fprintf(output, "Bad TFM file: "); fprintf(output, a, b);
- }
- @d badpp(a, b, c) { perfect = false; if (chars_on_line > 0) fprintf(output, " \n");
- chars_on_line = 0; fprintf(output, "Bad TFM file: "); fprintf(output, a, b, c);
- }
- @.Bad TFM file@>
- @d range_error(a) { perfect = false;
- fprintf(output, " \n%s index for character ", a);
- print_octal(c); fprintf(output, "%s\n", " is too large;");
- fprintf(output, "%s\n", "so I reset it to zero.");
- }
- @d bad_char(a, b) { perfect = false; if (chars_on_line > 0) fprintf(output, "%s\n", " ");
- chars_on_line = 0; fprintf(output, "Bad TFM file: %ld nonexistent character ", (long)(a));
- print_octal(b); fprintf(output, ".\n");
- }
- @d correct_bad_char(a, b) { perfect = false;
- if (chars_on_line > 0) fprintf(output, "%s\n", " ");
- chars_on_line = 0; fprintf(output, "Bad TFM file: %ld nonexistent character ", (long)a);
- print_octal(tfm(b)); fprintf(output, ".\n"); tfm(b) = bc;
- }
-
- @<Glob...@>=
- UWORD @!i; /* an index to words of a subfile */
- UWORD @!c; /* a random character */
- UBYTE @!d; /* byte number in a word */
- index @!k; /* a random index */
- UWORD @!r; /* a random two-byte value */
- UBYTE @!count; /* for when we need to enumerate a small set */
-
- @ There are a lot of simple things to do, and they have to be done one
- at a time, so we might as well get down to business. The first things
- that \.{TFtoPL} will put into the \.{PL} file appear in the header part.
-
- @<Do the header@>=
- { font_type = vanilla;
- if (lh >= 12)
- { @<Set the true |font_type|@>;
- if (lh >= 17)
- { @<Output the family name@>;
- if (lh >= 18) @<Output the rest of the header@>;
- }
- @<Output the character coding scheme@>;
- }
- @<Output the design size@>;
- @<Output the check sum@>;
- @<Output the |seven_bit_safe_flag|@>;
- }
-
- @ @<Output the check sum@>=
- left(); out("CHECKSUM"); out_octal(check_sum,4);
- right()
-
- @ Incorrect design sizes are changed to 10 points.
-
- @d bad_design(a) { bad("Design size "a"!");
- @.Design size wrong@>
- fprintf(output, "%s\n", "I've set it to 10 points.");
- out(" D 10");
- }
-
- @ @<Output the design size@>=
- left(); out("DESIGNSIZE");
- if (tfm(design_size) > 127) bad_design("negative")
- else if ((tfm(design_size) == 0) && (tfm(design_size+1) < 16))
- bad_design("too small")
- else out_fix(design_size);
- right();
- out("(COMMENT DESIGNSIZE IS IN POINTS)"); out_ln();
- out("(COMMENT OTHER SIZES ARE MULTIPLES OF DESIGNSIZE)"); out_ln()
- @.DESIGNSIZE IS IN POINTS@>
-
- @ Since we have to check two different {\mc BCPL} strings for validity,
- we might as well write a subroutine to make the check.
-
- @c void check_BCPL(index @!k, index @!l) /* checks a string of length |<l| */
- {
- index j; /* runs through the string */
- byte @!c; /* character being checked */
- @#
- if (tfm(k) >= l)
- { bad("String is too long; I've shortened it drastically.");
- @.String is too long...@>
- tfm(k) = 1;
- }
- for (j=k+1; j<=k+tfm(k); j++)
- { c = tfm(j);
- if ((c == '(') || (c == ')'))
- { bad("Parenthesis in string has been changed to slash.");
- @.Parenthesis...changed to slash@>
- tfm(j) = '/';
- }
- else if ((c < ' ') || (c > '~'))
- { bad("Nonstandard ASCII code has been blotted out.");
- @.Nonstandard ASCII code...@>
- tfm(j) = '?';
- }
- else if ((c >= 'a') && (c <= 'z')) tfm(j) = c+'A'-'a'; /* upper-casify letters */
- }
- }
-
- @ The |font_type| starts out |vanilla|; possibly we need to reset it.
-
- @<Set the true |font_type|@>=
- { check_BCPL(scheme,40);
- if ((tfm(scheme) >= 11) && @|(tfm(scheme+1) == 'T') && @|
- (tfm(scheme+2) == 'E') && @|(tfm(scheme+3) == 'X') && @|
- (tfm(scheme+4) == ' ') && @|(tfm(scheme+5) == 'M') && @|
- (tfm(scheme+6) == 'A') && @|(tfm(scheme+7) == 'T') && @|
- (tfm(scheme+8) == 'H') && @|(tfm(scheme+9) == ' '))
- { if ((tfm(scheme+10) == 'S') && (tfm(scheme+11) == 'Y')) font_type = mathsy;
- else if ((tfm(scheme+10) == 'E') && (tfm(scheme+11) == 'X')) font_type = mathex;
- }
- }
-
- @ @<Output the character coding scheme@>=
- left(); out("CODINGSCHEME");
- out_BCPL(scheme);
- right()
-
- @ @<Output the family name@>=
- left(); out("FAMILY");
- check_BCPL(family,20);
- out_BCPL(family);
- right()
-
- @ @<Output the rest of the header@>=
- { left(); out("FACE"); out_face(random_word+3); right();
- for (i=18; i<lh; i++)
- { left(); out("HEADER D "); outl(i);
- out_octal(check_sum+4*i,@,4); right();
- }
- }
-
- @ This program does not check to see if the |seven_bit_safe_flag| has the
- correct setting, i.e., if it really reflects the seven-bit-safety of
- the \.{TFM} file; the stated value is merely put into the \.{PL} file.
- The \.{PLtoTF} program will store a correct value and give a warning
- message if a file falsely claims to be safe.
-
- @<Output the |seven_bit_safe_flag|@>=
- if ((lh > 17) && (tfm(random_word) > 127))
- { left(); out("SEVENBITSAFEFLAG TRUE"); right();
- }
-
- @ The next thing to take care of is the list of parameters.
-
- @<Do the parameters@>=
- if (np > 0)
- { left(); out("FONTDIMEN"); out_ln();
- for (i=1; i<=np; i++) @<Check and output the $i$th parameter@>;
- right();
- }
- @<Check to see if |np| is complete for this font type@>;
-
- @ @<Check to see if |np|...@>=
- if ((font_type == mathsy) && (np != 22))
- fprintf(output, "Unusual number of fontdimen parameters for a math symbols font (%ld not 22).", (long)np);
- @.Unusual number of fontdimen...@>
- else if ((font_type == mathex) && (np != 13))
- fprintf(output, "Unusual number of fontdimen parameters for an extension font (%ld not 13).", (long)np);
-
- @ All |fix_word| values except the design size and the first parameter
- will be checked to make sure that they are less than 16.0 in magnitude,
- using the |check_fix| macro:
-
- @d check_fix(a, b) if ((tfm(a) > 0) && (tfm(a) < 255))
- { tfm(a) = 0; tfm((a)+1) = 0; tfm((a)+2) = 0; tfm((a)+3) = 0;
- badpp("%s %ld is too big;", b ,(long)i);
- fprintf(output, "%s\n", "I have set it to zero.");
- }
-
- @<Check and output the $i$th parameter@>=
- { left();
- if (i == 1) out("SLANT"); /* this parameter is not checked */
- else { check_fix(param(i), "Parameter");@/
- @.Parameter n is too big@>
- @<Output the name of parameter $i$@>;
- }
- out_fix((index)(param(i))); right();
- }
-
- @ @<Output the name...@>=
- if (i <= 7) switch (i) {
- case 2: out("SPACE"); break;
- case 3: out("STRETCH"); break;
- case 4: out("SHRINK"); break;
- case 5: out("XHEIGHT"); break;
- case 6: out("QUAD"); break;
- case 7: out("EXTRASPACE"); break;
- }
- else if ((i <= 22) && (font_type == mathsy)) switch (i) {
- case 8: out("NUM1"); break;
- case 9: out("NUM2"); break;
- case 10: out("NUM3"); break;
- case 11: out("DENOM1"); break;
- case 12: out("DENOM2"); break;
- case 13: out("SUP1"); break;
- case 14: out("SUP2"); break;
- case 15: out("SUP3"); break;
- case 16: out("SUB1"); break;
- case 17: out("SUB2"); break;
- case 18: out("SUPDROP"); break;
- case 19: out("SUBDROP"); break;
- case 20: out("DELIM1"); break;
- case 21: out("DELIM2"); break;
- case 22: out("AXISHEIGHT"); break;
- }
- else if ((i <= 13) && (font_type == mathex))
- if (i == 8) out("DEFAULTRULETHICKNESS");
- else { out("BIGOPSPACING"); outl(i-8);}
- else {out("PARAMETER D "); outl(i);}
-
- @ We need to check the range of all the remaining |fix_word| values,
- and to make sure that |width[0]=0|, etc.
-
- @d nonzero_fix(a) (tfm(a) > 0) || (tfm(a+1) > 0) || (tfm(a+2) > 0) || (tfm(a+3) > 0)
-
- @<Check the |fix_word| entries@>=
- if (nonzero_fix(4*width_base)) bad("width[0] should be zero.");
- @.should be zero@>
- if (nonzero_fix(4*height_base)) bad("height[0] should be zero.");
- if (nonzero_fix(4*depth_base)) bad("depth[0] should be zero.");
- if (nonzero_fix(4*italic_base)) bad("italic[0] should be zero.");
- for (i=0; i<nw; i++) check_fix(4*(width_base+i), "Width");
- @.Width n is too big@>
- for (i=0; i<nh; i++) check_fix(4*(height_base+i), "Height");
- @.Height n is too big@>
- for (i=0; i<nd; i++) check_fix(4*(depth_base+i), "Depth");
- @.Depth n is too big@>
- for (i=0; i<ni; i++) check_fix(4*(italic_base+i), "Italic correction");
- @.Italic correction n is too big@>
- if (nk > 0) for (i=0; i<nk; i++) check_fix(kern(i), "Kern");
- @.Kern n is too big@>
-
- @ The ligaturefdivkerning program comes next. Before we can put it out in
- \.{PL} format, we need to make a table of ``labels'' that will be inserted
- into the program. For each character |c| whose |tag| is |lig_tag| and
- whose starting address is |r|, we will store the pair |(c,r)| in the
- |label_table| array. If there's a boundary-char program starting at~|r|,
- we also store the pair |(256,r)|.
- This array is sorted by its second components, using the
- simple method of straight insertion.
-
- @<Glob...@>=
- struct {
- UWORD @!cc;
- lig_size_type @!rr;
- } @!label_table[259];
- UWORD @!label_ptr; /* the largest entry in |label_table| */
- UWORD @!sort_ptr; /* index into |label_table| */
- UWORD @!boundary_char; /* boundary character, or 256 if none */
- UWORD @!bchar_label; /* beginning of boundary character program */
-
- @ @<Set init...@>=
- boundary_char = 256; bchar_label = (UWORD)(077777L);@/
- label_ptr = 0; label_table[0].rr = 0; /* a sentinel appears at the bottom */
-
- @ We'll also identify and remove inaccessible program steps, using the
- |activity| array.
-
- @d unreachable 0 /* a program step not known to be reachable */
- @d pass_through 1 /* a program step passed through on initialization */
- @d accessible 2 /* a program step that can be relevant */
-
- @<Glob...@>=
- UBYTE @!activity[lig_size+1];
- lig_size_type @!ai, @!acti; /* indices into |activity| */
-
- @ @<Do the ligatures and kerns@>=
- if (nl > 0)
- { for (ai=0; ai<nl; ai++) activity[ai] = unreachable;
- @<Check for a boundary char@>;
- }
- @<Build the label table@>;
- if (nl > 0)
- { left(); out("LIGTABLE"); out_ln();@/
- @<Compute the |activity| array@>;
- @<Output and correct the ligature/kern program@>;
- right();
- @<Check for ligature cycles@>;
- }
-
- @ We build the label table even when |nl=0|, because this catches errors
- that would not otherwise be detected.
-
- @<Build...@>=
- for (c=bc; c<=ec; c++) if (tag(c) == lig_tag)
- { r = remainder(c);
- if (r < nl)
- { if (tfm(lig_step(r)) > stop_flag)
- { r = 256*tfm(lig_step(r)+2)+tfm(lig_step(r)+3);
- if (r < nl) if (activity[remainder(c)] == unreachable)
- activity[remainder(c)] = pass_through;
- }
- }
- if (r >= nl)
- { perfect = false; fprintf(output, "%s\n", " ");
- fprintf(output, "Ligature/kern starting index for character "); print_octal(c);
- fprintf(output, "%s\n", " is too large;"); fprintf(output, "%s\n", "so I removed it."); reset_tag(c);
- @.Ligature/kern starting index...@>
- }
- else @<Insert |(c,r)| into |label_table|@>;
- }
- label_table[label_ptr+1].rr = lig_size; /* put ``infinite'' sentinel at the end */
-
- @ @<Insert |(c,r)|...@>=
- { sort_ptr = label_ptr; /* there's a hole at position |sort_ptr+1| */
- while( label_table[sort_ptr].rr > r )
- { label_table[sort_ptr+1] = label_table[sort_ptr];
- decr(sort_ptr); /* move the hole */
- }
- label_table[sort_ptr+1].cc = c;
- label_table[sort_ptr+1].rr = r; /* fill the hole */
- incr(label_ptr); activity[r] = accessible;
- }
-
- @ @<Check for a bound...@>=
- if (tfm(lig_step(0)) == 255)
- { left(); out("BOUNDARYCHAR");
- boundary_char = tfm(lig_step(0)+1); out_char(boundary_char); right();
- activity[0] = pass_through;
- }
- if (tfm(lig_step(nl-1)) == 255)
- { r = 256*tfm(lig_step(nl-1)+2)+tfm(lig_step(nl-1)+3);
- if (r >= nl)
- { perfect = false; fprintf(output, "%s\n", " ");
- fprintf(output, "Ligature/kern starting index for boundarychar is too large;");
- fprintf(output, "%s\n", "so I removed it.");
- @.Ligature/kern starting index...@>
- }
- else { label_ptr = 1; label_table[1].cc = 256; label_table[1].rr = r;
- bchar_label = r; activity[r] = accessible;
- }
- activity[nl-1] = pass_through;
- }
-
- @ @<Compute the |activity| array@>=
- for (ai=0; ai<nl; ai++) if (activity[ai] == accessible)
- { r = tfm(lig_step(ai));
- if (r < stop_flag)
- { r = r+ai+1;
- if (r >= nl)
- { badp("Ligature/kern step %ld skips too far;", (long)ai);
- @.Lig...skips too far@>
- fprintf(output, "%s\n", "I made it stop."); tfm(lig_step(ai)) = stop_flag;
- }
- else activity[r] = accessible;
- }
- }
-
- @ We ignore |pass_through| items, which don't need to be mentioned in
- the \.{PL} file.
-
- @<Output and correct the ligature...@>=
- sort_ptr = 1; /* point to the next label that will be needed */
- for (acti=0; acti<nl; acti++) if (activity[acti] != pass_through)
- { i = acti; @<Take care of commenting out unreachable steps@>;
- @<Output any labels for step $i$@>;
- @<Output step $i$ of the ligature/kern program@>;
- }
- if (level == 2) right() /* the final step was unreachable */
-
- @ @<Output any labels...@>=
- while( i == label_table[sort_ptr].rr )
- { left(); out("LABEL");
- if (label_table[sort_ptr].cc == 256) out(" BOUNDARYCHAR");
- else out_char(label_table[sort_ptr].cc);
- right(); incr(sort_ptr);
- }
-
- @ @<Take care of commenting out...@>=
- if (activity[i] == unreachable)
- { if (level == 1)
- { left(); out("COMMENT THIS PART OF THE PROGRAM IS NEVER USED!"); out_ln();
- }
- }
- else if (level == 2) right()
-
- @ @<Output step $i$...@>=
- { k = (index)(lig_step(i));
- if (tfm(k) > stop_flag)
- { if (256*tfm(k+2)+tfm(k+3) >= nl)
- bad("Ligature unconditional stop command address is too big.");
- @.Ligature unconditional stop...@>
- }
- else if (tfm(k+2) >= kern_flag) @<Output a kern step@>@;
- else @<Output a ligature step@>;
- if (tfm(k) > 0)
- if (level == 1) @<Output either \.{SKIP} or \.{STOP}@>;
- }
-
- @ The \.{SKIP} command is a bit tricky, because we will be omitting all
- inaccessible commands.
-
- @<Output either...@>=
- { if (tfm(k) >= stop_flag) out("(STOP)");
- else { count = 0;
- for (ai=i+1; ai<=i+tfm(k); ai++) if (activity[ai] == accessible) incr(count);
- out("(SKIP D "); outl(count); outc(')'); /* possibly $count=0$, so who cares */
- }
- out_ln();
- }
-
- @ @<Output a kern step@>=
- { if (nonexistent(tfm(k+1))) if (tfm(k+1) != boundary_char)
- correct_bad_char("Kern step for", k+1);
- @.Kern step for nonexistent...@>
- left(); out("KRN"); out_char(tfm(k+1));
- r = 256*(tfm(k+2)-kern_flag)+tfm(k+3);
- if (r >= nk)
- { bad("Kern index too large.");
- @.Kern index too large@>
- out(" R 0.0");
- }
- else out_fix((index)(kern(r)));
- right();
- }
-
- @ @<Output a ligature step@>=
- { if (nonexistent(tfm(k+1))) if (tfm(k+1) != boundary_char)
- correct_bad_char("Ligature step for", k+1);
- @.Ligature step for nonexistent...@>
- if (nonexistent(tfm(k+3)))
- correct_bad_char("Ligature step produces the", k+3);
- @.Ligature step produces...@>
- left(); r = tfm(k+2);
- if ((r == 4) || ((r > 7) && (r != 11)))
- { fprintf(output, "%s\n", "Ligature step with nonstandard code changed to LIG");
- r = 0; tfm(k+2) = 0;
- }
- if (r%4 > 1) outc('/');
- out("LIG");
- if (odd(r)) outc('/');
- while( r > 3 )
- { outc('>'); r = r-4;
- }
- out_char(tfm(k+1)); out_char(tfm(k+3)); right();
- }
-
- @ The last thing on \.{TFtoPL}'s agenda is to go through the
- list of |char_info| and spew out the information about each individual
- character.
-
- @<Do the characters@>=
- sort_ptr = 0; /* this will suppress `\.{STOP}' lines in ligature comments */
- for (c=bc; c<=ec; c++) if (width_index(c) > 0)
- { if (chars_on_line == 8)
- { fprintf(output, "%s\n", " "); chars_on_line = 1;
- }
- else { if (chars_on_line > 0) putc(' ', output);
- incr(chars_on_line);
- }
- print_octal(c); /* progress report */
- left(); out("CHARACTER"); out_char(c); out_ln();
- @<Output the character's width@>;
- if (height_index(c) > 0) @<Output the character's height@>;
- if (depth_index(c) > 0) @<Output the character's depth@>;
- if (italic_index(c) > 0) @<Output the italic correction@>;
- switch (tag(c)) {
- case no_tag: do_nothing; break;
- case lig_tag: @<Output the applicable part of the ligature/kern
- program as a comment@>; break;
- case list_tag: @<Output the character link unless there is a problem@>; break;
- case ext_tag: @<Output an extensible character recipe@>; break;
- }/* there are no other cases */
- right();
- }
-
- @ @<Output the character's width@>=
- { left(); out("CHARWD");
- if (width_index(c) >= nw) range_error("Width")@;
- else out_fix((index)(width(c)));
- right();
- }
-
- @ @<Output the character's height@>=
- if (height_index(c) >= nh) range_error("Height")@;
- @.Height index for char...@>
- else { left(); out("CHARHT"); out_fix((index)(height(c))); right();
- }
-
- @ @<Output the character's depth@>=
- if (depth_index(c) >= nd) range_error("Depth")@;
- @.Depth index for char@>
- else { left(); out("CHARDP"); out_fix((index)(depth(c))); right();
- }
-
- @ @<Output the italic correction@>=
- if (italic_index(c) >= ni) range_error("Italic correction")@;
- @.Italic correction index for char...@>
- else { left(); out("CHARIC"); out_fix((index)(italic(c))); right();
- }
-
- @ @<Output the applicable part of the ligature...@>=
- { left(); out("COMMENT"); out_ln();@/
- i = remainder(c);
- r = (UWORD)(lig_step(i));
- if (tfm(r) > stop_flag) i = 256*tfm(r+2)+tfm(r+3);
- do { @<Output step...@>;
- if (tfm(k) >= stop_flag) i = nl;
- else i = i+1+tfm(k);
- } while(i < nl);
- right();
- }
-
- @ We want to make sure that there is no cycle of characters linked together
- by |list_tag| entries, since such a cycle would get \TeX\ into an endless
- loop. If such a cycle exists, the routine here detects it when processing
- the largest character code in the cycle.
-
- @<Output the character link unless there is a problem@>=
- { r = remainder(c);
- if (nonexistent(r))
- { bad_char("Character list link to", r); reset_tag(c);
- @.Character list link...@>
- }
- else { while ((r < c) && (tag(r) == list_tag)) r = remainder(r);
- if (r == c)
- { bad("Cycle in a character list!");
- @.Cycle in a character list@>
- fprintf(output, "Character "); print_octal(c);
- fprintf(output, "%s\n", " now ends the list.");
- reset_tag(c);
- }
- else { left(); out("NEXTLARGER"); out_char(remainder(c));
- right();
- }
- }
- }
-
- @ @<Output an extensible character recipe@>=
- if (remainder(c) >= ne)
- { range_error("Extensible"); reset_tag(c);
- @.Extensible index for char@>
- }
- else { left(); out("VARCHAR"); out_ln();
- @<Output the extensible pieces that exist@>;
- right();
- }
-
- @ @<Output the extensible pieces that...@>=
- for (k=0; k<=3; k++) if ((k == 3) || (tfm(exten(c)+k) > 0))
- { left();
- switch (k) {
- case 0: out("TOP"); break;
- case 1: out("MID"); break;
- case 2: out("BOT"); break;
- case 3: out("REP"); break;
- }
- if (nonexistent(tfm(exten(c)+k))) out_char(c);
- else out_char(tfm(exten(c)+k));
- right();
- }
-
- @ Some of the extensible recipes may not actually be used, but \TeX\ will
- complain about them anyway if they refer to nonexistent characters.
- Therefore \.{TFtoPL} must check them too.
-
- @<Check the extensible recipes@>=
- if (ne > 0) for (c=0; c<ne; c++) for (d=0; d<=3; d++)
- { k = (index)(4*(exten_base+c)+d);
- if ((tfm(k) > 0) || (d == 3))
- { if (nonexistent(tfm(k)))
- { bad_char("Extensible recipe involves the", tfm(k));
- @.Extensible recipe involves...@>
- if (d < 3) tfm(k) = 0;
- }
- }
- }
-
- @* Checking for ligature loops.
- We have programmed almost everything but the most interesting calculation of
- all, which has been saved for last as a special treat. \TeX's extended ligature
- mechanism allows unwary users to specify sequences of ligature replacements
- that never terminate. For example, the pair of commands
- $$\.{(fdivLIG $x$ $y$) (fdivLIG $y$ $x$)}$$
- alternately replaces character $x$ by character $y$ and vice versa. A similar
- loop occurs if \.{(LIGfdiv $z$ $y$)} occurs in the program for $x$ and
- \.{(LIGfdiv $z$ $x$)} occurs in the program for $y$.
-
- More complicated loops are also possible. For example, suppose the ligature
- programs for $x$ and $y$ are
- $$\vcenter{\halign{#\hfil\cr
- \.{(LABEL $x$)(fdivLIGfdiv $z$ $w$)(fdivLIGfdiv> $w$ $y$)} \dots,\cr
- \.{(LABEL $y$)(LIG $w$ $x$)} \dots;\cr}}$$
- then the adjacent characters $xz$ change to $xwz$, $xywz$, $xxz$, $xxwz$,
- \dots, ad infinitum.
-
- @ To detect such loops, \.{TFtoPL} attempts to evaluate the function
- $f(x,y)$ for all character pairs $x$ and~$y$, where $f$ is defined as
- follows: If the current character is $x$ and the next character is
- $y$, we say the ``cursor'' is between $x$ and $y$; when the cursor
- first moves past $y$, the character immediately to its left is
- $f(x,y)$. This function is defined if and only if no infinite loop is
- generated when the cursor is between $x$ and~$y$.
-
- The function $f(x,y)$ can be defined recursively. It turns out that all pairs
- $(x,y)$ belong to one of five classes. The simplest class has $f(x,y)=y$; this
- happens if there's no ligature between $x$ and $y$, or in the cases
- \.{LIGfdiv>} and \.{fdivLIGfdiv>>}. Another simple class arises when there's a
- \.{LIG} or \.{fdivLIG>} between $x$ and~$y$, generating the character~$z$;
- then $f(x,y)=z$. Otherwise we always have $f(x,y)$ equal to
- either $f(x,z)$ or $f(z,y)$ or $f(f(x,z),y)$, where $z$ is the inserted
- ligature character.
-
- The first two of these classes can be merged; we can also consider
- $(x,y)$ to belong to the simple class when $f(x,y)$ has been evaluated.
- For technical reasons we allow $x$ to be 256 (for the boundary character
- at the left) or 257 (in cases when an error has been detected).
-
- For each pair $(x,y)$ having a ligature program step, we store
- $(x,y)$ in a hash table from which the values $z$ and $class$ can be read.
-
- @d simple 0 /* $f(x,y)=z$ */
- @d left_z 1 /* $f(x,y)=f(z,y)$ */
- @d right_z 2 /* $f(x,y)=f(x,z)$ */
- @d both_z 3 /* $f(x,y)=f(f(x,z),y)$ */
- @d pending 4 /* $f(x,y)$ is being evaluated */
-
- @<Glob...@>=
- LONG @!hash[hash_size+1]; /* $256x+y+1$ for $x\le257$ and $y\le255$ */
- UBYTE @!class[hash_size+1];
- UWORD @!lig_z[hash_size+1];
- hash_size_type @!hash_ptr; /* the number of nonzero entries in |hash| */
- hash_size_type @!hash_list[hash_size+1]; /* list of those nonzero entries */
- hash_size_type @!h,@!hh; /* indices into the hash table */
- UWORD @!x_lig_cycle,@!y_lig_cycle; /* problematic ligature pair */
-
- @ @<Check for ligature cycles@>=
- hash_ptr = 0; y_lig_cycle = 256;
- for (hh=0; hh<=hash_size; hh++) hash[hh] = 0; /* clear the hash table */
- for (c=bc; c<=ec; c++) if (tag(c) == lig_tag)
- { i = remainder(c);
- if (tfm(lig_step(i)) > stop_flag)
- i = 256*tfm(lig_step(i)+2)+tfm(lig_step(i)+3);
- @<Enter data for character $c$ starting at location |i| in the hash table@>;
- }
- if (bchar_label < nl)
- { c = 256; i = bchar_label;
- @<Enter data for character $c$ starting at location |i| in the hash table@>;
- }
- if (hash_ptr == hash_size)
- { fprintf(output, "%s\n", "Sorry, I haven't room for so many ligature/kern pairs!");
- @.Sorry, I haven't room...@>
- goto final_end;
- }
- for (hh=1; hh<=hash_ptr; hh++)
- { r = hash_list[hh];
- if (class[r] > simple) /* make sure $f$ is defined */
- r = f(r, (index)((hash[r]-1)/256), (index)((hash[r]-1)%256));
- }
- if (y_lig_cycle < 256)
- { fprintf(output, "Infinite ligature loop starting with ");
- @.Infinite ligature loop...@>
- if (x_lig_cycle == 256) fprintf(output, "boundary");@+else print_octal(x_lig_cycle);
- fprintf(output, " and "); print_octal(y_lig_cycle); fprintf(output, "%s\n", "!");
- out("(INFINITE LIGATURE LOOP MUST BE BROKEN!)"); goto final_end;
- }
-
- @ @<Enter data for character $c$...@>=
- do { hash_input(); k = tfm(lig_step(i));
- if (k >= stop_flag) i = nl;
- else i = i+1+k;
- }
- while(i < nl);
-
- @ We use an ``ordered hash table'' with linear probing, because such a table
- is efficient when the lookup of a random key tends to be unsuccessful.
-
- @c void hash_input(void) /* enter data for character |c| and command |i| */
- {
- UBYTE @!cc; /* class of data being entered */
- UBYTE @!zz; /* function value or ligature character being entered */
- UBYTE @!y; /* the character after the cursor */
- LONG @!key; /* value to be stored in |hash| */
- LONG @!t; /* temporary register for swapping */
- @#
- if (hash_ptr == hash_size) return;
- @<Compute the command parameters |y|, |cc|, and |zz|@>;
- key = 256*c+y+1; h = (index)((1009*key) % hash_size);
- while( hash[h] > 0 )
- { if (hash[h] <= key)
- { if (hash[h] == key) return; /* unused ligature command */
- t = hash[h]; hash[h] = key; key = t; /* do ordered-hash-table insertion */
- t = class[h]; class[h] = cc; cc = t; /* namely, do a swap */
- t = lig_z[h]; lig_z[h] = zz; zz = t;
- }
- if (h > 0) decr(h);@+else h = hash_size;
- }
- hash[h] = key; class[h] = cc; lig_z[h] = zz; incr(hash_ptr); hash_list[hash_ptr] = h;
- }
-
- @ We must store kern commands as well as ligature commands, because the former
- might make the latter inapplicable.
-
- @<Compute the command param...@>=
- k = (index)(lig_step(i)); y = tfm(k+1); t = tfm(k+2); cc = simple; zz = tfm(k+3);
- if (t >= kern_flag) zz = y;
- else { switch (t) {
- case 0:
- case 6: do_nothing; break; /* \.{LIG},\.{/LIG>} */
- case 5:
- case 11: zz = y; break; /* \.{LIG/>}, \.{/LIG/>>} */
- case 1:
- case 7: cc = left_z; break; /* \.{LIG/}, \.{/LIG/>} */
- case 2: cc = right_z; break; /* \.{/LIG} */
- case 3: cc = both_z; break; /* \.{/LIG/} */
- }/* there are no other cases */
- }
-
- @ Evaluation of $f(x,y)$ is handled by two mutually recursive procedures.
- Kind of a neat algorithm, generalizing a depth-first search.
-
- @c index f(index @!h, index @!x, index @!y);@t\2@>
- /* compute $f$ for arguments known to be in |hash[h]| */
- index eval(index @!x, index @!y) /* compute $f(x,y)$ with hashtable lookup */
- {
- LONG @!key; /* value sought in hash table */
- @#
- key = 256*x+y+1; h = (index)((1009*key) % hash_size);
- while( hash[h] > key )
- if (h > 0) decr(h);@+else h = hash_size;
- if (hash[h] < key) return(y); /* not in ordered hash table */
- else return(f(h, x, y));
- }
-
- @ Pascal's beastly convention for |forward| declarations prevents us from
- saying |function f(h,x,y:index):index| here.
-
- @c index f(index @!h, index @!x, index @!y)
- {
- switch (class[h]) {
- case simple: do_nothing; break;
- case left_z: class[h] = pending; lig_z[h] = eval(lig_z[h],y); class[h] = simple;
- break;
- case right_z: class[h] = pending; lig_z[h] = eval(x,lig_z[h]); class[h] = simple;
- break;
- case both_z: class[h] = pending; lig_z[h] = eval(eval(x,lig_z[h]),y);
- class[h] = simple; break;
- case pending: x_lig_cycle = x; y_lig_cycle = y; lig_z[h] = 257; class[h] = simple;
- break;/* the value 257 will break all cycles, since it's not in |hash| */
- }/* there are no other cases */
- return(lig_z[h]);
- }
-
- @* The main program.
- The routines sketched out so far need to be packaged into separate procedures,
- on some systems, since some \PASCAL\ compilers place a strict limit on the
- size of a routine. The packaging is done here in an attempt to avoid some
- system-dependent changes.
-
- First comes the |organize| procedure, which reads the input data and
- gets ready for subsequent events. If something goes wrong, the routine
- returns |false|.
-
- @c boolean organize(void)
- {
- index tfm_ptr; /* an index into |tfm| */
- @#
- @<Read the whole input file@>;@/
- @<Set subfile sizes |lh|, |bc|, \dots, |np|@>;@/
- @<Compute the base addresses@>;@/
- return(true);
- final_end: return(false);
- }
-
- @ Next we do the simple things.
-
- @c void do_simple_things(void)
- {
- UWORD i; /* an index to words of a subfile */
- @#
- @<Do the header@>;@/
- @<Do the parameters@>;@/
- @<Check the |fix_word| entries@>@/
- }
-
- @ And then there's a routine for individual characters.
-
- @c void do_characters(void)
- {
- byte @!c; /* character being done */
- index @!k; /* a random index */
- lig_size_type @!ai; /* index into |activity| */
- @#
- @<Do the characters@>;@/
- }
-
- @ Here is where \.{TFtoPL} begins and ends.
- @c
- @<The function |append_extension|@>@/
- int main(int argc, char ** argv)
- {
- @<Scan the file names@>@/
- initialize();@/
- if (!organize()) return(1);
- do_simple_things();@/
- @<Do the ligatures and kerns@>;
- @<Check the extensible recipes@>;
- do_characters(); fprintf(output, "%s\n", ".");@/
- if (level != 0) fprintf(output, "%s\n", "This program isn't working!");
- @.This program isn't working@>
- if (!perfect)
- out("(COMMENT THE TFM FILE WAS BAD, SO THE DATA HAS BEEN CHANGED!)");
- @.THE TFM FILE WAS BAD...@>
- return(0);
- final_end:return(1);
- }
-
- @* System-dependent changes.
- This section should be replaced, if necessary, by changes to the program
- that are necessary to make \.{TFtoPL} work at a particular installation.
- It is usually best to design your change file so that all changes to
- previous sections preserve the section numbering; then everybody's version
- will be consistent with the printed program. More extensive changes,
- which introduce new sections, can be inserted here; then only the index
- itself will get a new section number.
- @^system dependencies@>
-
- @
- @<The function |append_extension|@>=
- void append_extension(char * name, char * extension)
- {
- char * p;
-
- p = name + strlen(name)-1;
- while (p > name) {
- if ((*p == '.') || (*p == '\\') || (*p == ':'))
- break;
- p--;
- }
- if ((p == name) || (*p != '.')) {
- strcat(name, ".");
- strcat(name, extension);
- }
- else {
- p++;
- if (strcmp(p, extension) != 0)
- strcpy(p, extension);
- }
- }
-
- @
- @<Scan the file names@>=
- argv++;
- argc--;
- if (argc == 0) {
- fprintf(stderr, "! No TFM file speciefied\n");
- exit(1);
- }
- strcpy(tfm_name, *argv);
- append_extension(tfm_name, "tfm");
- argv++;
- argc--;
- @#
- if (argc != 0) {
- strcpy(pl_name, *argv);
- argv++;
- argc--;
- }
- else
- strcpy(pl_name, tfm_name);
- append_extension(pl_name, "pl");
-
- @* Index.
- Pointers to error messages appear here together with the section numbers
- where each ident\-i\-fier is used.
-